🌻 The seamless workflow from AI interviews to causal map

An AI interviewer can successfully gather causal information at scale

An AI interviewer can successfully gather causal information at scale

Question for Step 1 - can an AI interviewer successfully gather causal information at scale?: Our AI interviewer was able to conduct multiple interviews with no researcher intervention at a low cost, reproducing the results of [@chopraConductingQualitativeInterviews2023; @anderssonTheoryChangeSustainable2024]. The interview transcripts read quite naturally and the process seems to have been acceptable to the interviewees.

!

!

AI interviewing - beware of sensitive data

AI interviewing - beware of sensitive data

Ethics, bias and validity

This kind of AI processing is not suitable for dealing with sensitive data because information from the interviews passes to OpenAI’s servers, even though it is no longer used for training models [@openaiAnnouncingGPT4oAPI2024].

Using AI interviewing - beware of bias

[@headLargeLanguageModel2023] and [@reidVisionEquitableAI2023] raise concerns about bias and the importance of equity in AI applications for evaluation, which have led to questions about the validity of AI-generated findings [@azzamArtificialIntelligenceValidity2023]. The way the AI sees the world, the salient features it identifies, the words it uses to identify them, and its understanding of causation are certainly wrapped up in a hegemonic worldview (Bender et al., 2021). Those groups most likely to be disadvantaged by this worldview are approximately the same who have least say in how these technologies are developed and employed.

AI is developing quickly: new models and techniques become available every month. However, we believe that any tools which genuinely add to knowledge should use procedures which are broken down into workflows consisting of simple individual steps so that humans can understand and check what is happening.

AI interviewing - beware of suitability

AI interviewing - beware of suitability

Interviewing

Researchers should carefully consider whether the interview subject matter is compatible with this kind of approach. For example, the AI may miss subtle cues or struggle to provide appropriate support to respondents expressing distress [@chopraConductingQualitativeInterviews2023; @rayChatGPTComprehensiveReview2023]. We recommend that interview guidelines are tested and refined by human interviewers before being automated. No automated interview can substitute for the contextual information which a human evaluator can gain by talking directly to a respondent, ideally face-to-face and in a relevant context.

There is likely to be a differential response rate in this kind of interview: some people are less likely to respond to an AI-driven interview than others, and this propensity may not be random.

AI interviewing - the evaluator retains responsibility

AI interviewing - the evaluator retains responsibility

Autocoding

The work of the AI coder and clustering algorithms are not error-free. The coding of individual high-stakes causal links should be checked. In particular, there is a danger of accepting inaccurate results which look plausible.

This approach does not nurture substantive, large-scale theory-building of the kind expected, for example in grounded theory [@glaserDiscoveryGroundedTheory1967]. However, it can do smaller-scale theory-building in the sense of capturing theories implicit in individuals’ responses.

This pipeline relieves researchers of much of the work involved in coding but it is not fully autonomous. The human evaluator is responsible for applying the techniques in a trustworthy way and for drawing valid conclusions.

AI interviewing has potential - scalability, reach, reproducibility, causality

AI interviewing has potential - scalability, reach, reproducibility, causality

Qualitative approach: These procedures approach the stakeholder stories as far as possible without preconceived templates, to remain open to emerging and unexpected changes in respondents’ causal landscapes.

Scalability and reach: The AI’s ability to communicate in many languages presents an opportunity to reach more places and people, subject to internet access and the AI’s fluency in less common languages, and to include representative samples of populations.

The interview and coding processes are machine-driven and use zero temperature, so this approach should be mostly reproducible. Reproducibility opens the possibility of comparing results across groups, places and timepoints.

The low cost of coding large amounts of information means that it is much easier to develop, compare and discard hypotheses and coding approaches, something which qualitative researchers have previously been understandably reluctant to do.

Qualitative causality: These procedures have the potential to help evaluators answer evaluation questions which are often causal in nature, like: understanding stakeholders' mental models; judging whether "their" ToC matches "ours"; investigating “how things work” for different subgroups of stakeholders; tracing impact from mentions of "our" intervention to outcomes of interest; triaging the key outcomes in stakeholders’ perspectives.

In summary, this kind of semi-automated pipeline opens up possibilities for monitoring, evaluation and social research which were unimaginable just three years ago and are well suited to today’s challenging, complex problems like climate change and political and social polarisation. Previously, only quantitative research claimed to produce generalisable knowledge about social phenomena validly and at scale, by turning meaning into numbers. Now perhaps qualitative research will eclipse quantitative research by bypassing quantification and dealing with meaning directly, in somewhat generalisable ways.

AI interviewing needs further work

AI interviewing needs further work

We have tried to demonstrate a semi-automated workflow with which evaluators can capture stakeholders’ emergent views of the structure of a problem or program at the same time as capturing their beliefs about the contributions made to factors of interest by other factors. We have presented this approach via a proxy application but have since applied it in real-life research. Many challenges remain, from improving the behaviour of the automated interviewer through improving the accuracy of the causal coding process to dealing better with valence (for example distinguishing between “employment”, “employment issues” and “unemployment”). Perhaps most urgently needed are ways to better understand and counter how LLMs may reproduce hegemonic worldviews [@reidVisionEquitableAI2023].